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Introduction 


The SA-110 is the first StrongARM** implementation of the ARM** architecture. This document 
provides an overview of memory management followed by details specific to the ARM 
architecture and the SA-110 implementation itself. The MMU (memory management unit) model 
for the ARM architecture is described along with its relationship to cache and write buffer control. 
Behavior of the SA-110 is then discussed and sample code provided for a simple one-to-one 
mapped virtual-to-physical translation. 


Remember that the ARM Systems Architecture Manual and the SA-110 Technical Reference 
Manual remain the definitive texts for how the device operates, and provide fuller explanations in 
many cases. 


Memory Management Concepts 


Memory management can be described as the ability to manage the system address space, typically 
using a blend of software and dedicated hardware. A memory-managed address space as seen by 
the program is often referred to as a virtual address (VA) space, which is then translated into a 
physical address (PA) prior to accessing memory or IO. 


Memory management provides three functions: 
¢ Access control 
¢ Relocation (address translation) 


¢ Consistent state allowing a faulting access to be corrected and replayed by an exception 
handler with the same result as a normally completing access. This is the key feature required 
for a demand-paged memory system. 


Translation is performed using page tables and may involve multiple steps, each step providing 
finer granularity on the translated page size. Figure | illustrates a two-level lookup. A predefined 
translation base is merged with the first-level index to provide the address of a page table entry 
(PTE). The entry contains the base of the second-level table along with any associated control 
information required by the MMU. Combining this base with the second-level index from the 
original VA provides an address for the second-level PTE. This entry then provides the physical 
base, which is concatenated with the PA index to provide the required physical address. Other 
fields in the PTE are used for access control according to the MMU model. 
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Figure 1. Virtual-to-Physical Address Translation 
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The process of performing the above translation is commonly known as table walking, and may be 
performed in hardware or software. For performance reasons, microprocessors implement a cache 
of VA=>PA entries in translation lookaside buffers (TLBs) as part of the MMU. 


Many schemes can be implemented using the principles outlined. Multitasking systems can 
implement virtual address spaces on a per-process basis, each with their own page tables, or share a 
single space across all processes. Protection can be used to prohibit access for both privileged and 
nonprivileged program execution. Page faults result in access aborts where it is up to the exception 
handler to determine the cause of the fault, and take the appropriate corrective action (for example, 
map in the requested page from disk to physical memory on an application page miss). The onus is 
normally on MMU software to keep all the TLBs and translation tables consistent and avoid 
translation conflicts. 
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2.0 The ARM Architecture - MMU 


The ARM architecture is currently at Version 4. The evolution through the four versions is 
summarized in the preface to the ARM Architecture Reference Manual. Version 4 of the 
architecture introduced support for what is known as the Harvard Architecture - a computer 
architecture model with separate instruction and data paths to memory. 


The main features of the ARM MMU architecture are as follows: 
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All system architecture functions are controlled by reading or writing an ARM register (Rd) 
from/to a block of sixteen 32-bit registers (Rn) accessed at coprocessor number (cp_num) 15. 
Control bits are transferred within the opcode_1, opcode_2, and Rm instruction fields where 
necessary. 

The instruction format is: 


MRC/MCR_ p<cp_num>, <opcode_1>, Rd, cRn, cRm, <opcode_2> 


A single set of tables is used for both instruction and data fetches irrespective of whether a 
Harvard or Von Neumann (unified address and data access stream) architecture is 
implemented. 


Caches (instruction, data, or unified) are controlled through a combination of the system 
control coprocessor register and control bits in the page tables. 


1 MB sections are supported through a single-level lookup. 
64 KB or 4 KB pages are supported by a second-level lookup. 
Level | page tables consist of 4096, 32-bit entries (16 KB table size). 


Level 2 page tables consist of 256, 32-bit entries (1 KB table size) for each section. 64 KB 
page table entries are replicated sixteen times each within the table. This is because the level 2 
table and PA page indices overlap by four bits in the VA. 


Sections and pages can be protected using access permissions (AP bits) with no_access, 
read_only, or read_write for supervisor and user modes. 


Domains can be used to provide an additional level of protection across arbitrary PTEs. 
Domain settings determine whether access is enabled or not, and if enabled, whether the 
access permission checks are invoked or bypassed. 


Control bits are provided in the PTEs for enabling caching and write buffering on a 
section/page basis. 
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Figure 2. 


Figure 3. 


Table 1. 
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The first- and second-level page table entries are as illustrated in Figure 2 and Figure 3, 


respectively. 


Level 1 Page Table Format 































































































Fault SBZ 0| 0 
Page = : 
Page Table Base Address B Domain IMP 0/1 
Table Zz 
Ss | 
Section Section Base Address SBZ AP B Domain M;|C;|Bj1/]0 
Z P 
Reserved SBZ 1} 1 
Level 2 Page Table Format 
Fault SBZ 0| 0 
Pee Large Page Base Address SBZ AP3 AP2 AP1 APO |C/B/ 0} 1 
Small 
Page Small Page Base Address AP3 AP2 AP1 APO |C/B/1]0 
Reserved SBZ 1] 1 























Apart from the performance impact, translation is transparent to the program until an exception 
occurs, when the processor will enter abort mode and start executing from either the prefetch_abort or 
data_abort exception vectors. Table 1 provides a summary of all exceptions and their priorities. 


Exception Vectors and Their Priorities 






































Exception Type Exception Mode Vector Address Priority 
Reset SVC 0x00000000 1 (highest) 
Undefined instruction UNDEF 0x00000004 6 
SW interrupt (SWI) SVC 0x00000008 6 
Prefetch abort 
(instruction fetch memory | ABORT 0x0000000C 5 
abort) 
ne Sr access | ABORT 0x00000010 2 
IRQ (interrupt) IRQ 0x00000018 4 
FIQ (fast interrupt) FIQ 0x0000001C 3 








The ordering of data abort and FIQ exceptions are to ensure the correct capture of any faulting 

status that may happen coincident with the fast interrupt. Fast interrupt status is unchanged on entry 
to the data abort exception, meaning that a FIQ exception (assuming FIQs is enabled) will execute 
immediately on entry to the data abort handler, should this scenario occur. 
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2.1 The Coprocessor Interface 


The first 9 of 16 coprocessor 15 (cp15) registers are architected as shown in Table 2. 










































































Table 2. CP15 Register Summary 
Register Reads Writes MMU Function 
0 ID register UNPREDICTABLE _— 
1 Control Control X 
2 Translation Table Base Translation Table Base X 
3 Domain Access Control Domain Access Control X 
4 UNPREDICTABLE UNPREDICTABLE a 
5 Fault Status Fault Status X 
6 Fault Address Fault Address X 
7 Cache Operations Cache Operations xX 
8 TLB Operations TLB Operations X 
The control register includes bits for: 
¢ Enabling the MMU 
¢ Enabling address alignment fault checking 
* Cache enables - unified/data and instruction (when separate I &D) bits 
¢ Write buffer enable 
¢ System (S) and ROM (R) protection bits. 
Please see the SA-110 reference manual for a fuller description and complete listing of this register. 
The S and R bits are used in conjunction with the AP bits in the page table entries to resolve the 
protection for a given mode within a domain according to Table 3. 
Table 3. Access Permissions 
AP Ss Supervisor_Mode User_Mode 
00 0 No Access No Access 
00 0 Read only No Access 
00 1 Read only Read Only 
00 1 UNPREDICTABLE | UNPREDICTABLE 
01 X Read/Write No Access 
10 x Read/Write Read Only 
11 x Read/Write Read/Write 
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The Domain Access Control Register provides sixteen pairs of control bits, which are used to 
qualify the domain field in a PTE according to Table 4. 
Table 4. Domain Access Values 
Domain_crl Access Description 
All accesses will cause a domain 
09 NOTAGEEES fault exception 
: Accesses checked against the AP 
m vient bits (see Table 3) 
10 Reserved UNPREDICTABLE 
11 Manager No access permission checks 














Data aborts update two cp15 registers. Cp15_6 is updated with the faulting virtual address. Cp15_5 
is updated with fault status as summarized in Table 5 and Table 6. 
























































Table 5. Cp15 Fault Status Register Format 
[31:9] [8] [7:4] [3:0] 
UNPREDICTABLE/SBZ 0 domain status (FS) 
Table 6. Fault Status Encoding 
Priority Sources Domain FS[3:0] 
highest Terminal exception invalid 0b0010 
Vector exception invalid 0b0000 
Alignment invalid Ob00x1 
Ext. abort on translation 1st level invalid 0b1100 
2nd level valid 0b1110 
Translation section invalid 0b0101 
page valid 0b0111 
Domain section valid 061001 
page valid 0b1011 
Permission section valid 0b1101 
page valid 0b1111 
Ext. abort on linefetch section valid 0b0100 
page valid 0b0110 
Ext. abort on non-linefetch section valid 061000 
lowest : 
page valid 061010 























Processors may only implement a subset of these encodings. 
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The SA-110 Implementation 


The SA-110 has been implemented to V4 of the ARM architecture. It is the first ARM processor to 
adopt a Harvard Architecture and has the following features: 


¢ A virtually addressed, 16 KB instruction cache. The instruction cache is 32-way set 
associative, with a 32-byte cache line size. No coherence is maintained with main memory. 
Replacement uses a round-robin algorithm. 


¢ A virtually addressed, 16 KB data cache. The data cache is 32-way set associative writeback 
cache, with a 32-byte cache line size. Each entry has an associated valid bit and two dirty bits, 
allowing victim writes to be resolved to half lines for more efficient usage of system bandwidth 
to main memory. All victim writes occur through the write buffer. Data cache entries are only 
allocated on reads and use a round-robin replacement algorithm. The cache entries also include 
physical tag bits to allow writebacks without additional address translation. 


¢ Fully associative instruction TLBs. These are updated using a round-robin algorithm. 
¢ Fully associative data TLBs. These are updated using a round-robin algorithm. 


¢ Eight 16-byte entry write buffers. These are used by the writeback cache as well as for 
handling buffered writes that miss the data cache. Each entry includes physical address, data, 
and byte mask information. 


The SA-110 implements a five-stage pipe: 


Fetch fetches an instruction from Icache or memory. An Icache miss will stall the fetch stage 
of the pipeline until the full cacheline (eight memory fetches) is completed. Instruction 
fetch permission checks occur here and will be flagged, however, the exception will 
only execute if the instruction enters the decode stage. Branching may mean that the 
faulting condition never occurs. 


Decode decodes the instruction and reads input values from the register file. 
Execute executes shifts and arithmetic operations. Multiplies start in this stage. 


Buffer data cache or memory accesses. The integer multiplier completes execution in this 
stage; it retires 12 bits per clock. Results from the execute stage are buffered here, but 
available via bypasses in most cases. Translation and permission checks occur in this 
stage for data accesses, and will fault immediately if an exception is generated. 


Writeback _ the register file is updated with memory or result data. 


All translation occurs automatically once the tables have been set up, the coprocessor registers 
initialized, and the MMU enabled. Any of the control register functions (cp15_1 register writes) 
can be enabled in parallel; it is not necessary to perform a separate read-modify-write sequence to 
enable the MMU. Table walks will generate 32-bit reads from external memory. These reads do not 
check the write buffer, assuming that any updates to the tables have been flushed before translation 
uses the modified entries. 


MMU and the Caches 


The PTEs contain C and B bits for enabling caching and write buffering, respectively. While parts 
of the memory subsystem can be set up as bufferable but noncachable, the B bit should always be 
set for cachable D-space. This is because the write buffer is inherently used for writebacks. The 
SA-110 will automatically buffer cachable writes irrespective of the state of the B bit. 


Application Note 7 


a 
Memory Management on the StrongARM SA-110 | ntel é 


3.1.1 


3.1.2 


Table 7. 


Instruction Caching 


The Icache is always checked, even when disabled. Permission checks will occur if the MMU is 
enabled, otherwise, all hits will be taken. The Icache is enabled by setting bit 12 in cp15_1. 


If the MMU is enabled, instruction caching is dependent on the state of the C bit in the PTE. If the 
MMU is disabled, instruction fetches are considered cachable by default; instruction fetches from 
memory are allocated to an entry if the Icache is enabled. 


Instructions can be locked into the Icache by running a program with the Icache enabled, then 
disabling the Icache to stop any Icache updates. The other option is to bound cachable code in VA 
space such that no reallocation occurs, or it is minimized to a system-defined subset of the code. 
Additional code paths are then always fetched from noncachable VA space. 


Data Caching 


As with the Icache, the Dcache is always checked, even when disabled. It is enabled by setting bit 
2incp15_1. 


The MMU must be enabled and the C bit set for entries to be allocated in the Dcache. Dcache 
entries can be locked in a similar manner to the Icache, or by disabling the MMU with valid entries 
in the Deache. It is not possible to lock portions of the cache. 


Cache Management 


When the MMU is disabled, the cache TAGs equate to physical addresses. Any mapping changes 
of virtual to physical addresses must ensure that the caches are flushed appropriately. Remapping 
can occur when the MMU is enabled, when it is disabled, or while it is enabled. Coprocessor writes 
to cp15_7 provide mechanisms to manage the caches as summarized in Table 7. 


Cache Control Operations 























Function Opcode_2 Rm Data 
Flush Il+D 0b000 0b0111 Ignored 
Flush | 0b000 0b0101 Ignored 
Flush D 0b000 0b0110 Ignored 
Flush Deache entry 0b001 0b0110 Virtual address 
Clean Dcache entry 0b001 0b1010 Virtual address 
Drain write buffer 0b100 0b1010 Ignored 




















The whole Icache is flushed by a single instruction, whereas the Dcache can be flushed collectively 
or on a single-entry basis. To ensure that memory coherence is maintained, Dcache entries need to 
be cleaned prior to flushing. A loop is required to clean all Dcache entries. A fetch from a 
read_only of virtual addresses can be used as an alternative to a coprocessor clean instruction for 
this purpose. A clean loop needs to be terminated with a drain write buffer command to ensure that 
all the victims are written to main memory. 


Cache flushes may be necessary for several reasons: 
¢ Copying code from ROM to RAM prior to execution. 


¢ Self-modifying code. In this case the modifications will occur as data, but execution will occur 
from the separate instruction stream. 


¢ Context switches involving changes to the memory map. 
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Table 8. 


3.3 


TLB Management 
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As with the caches, it is important to flush potentially conflicting entries when a context switch 
occurs. Four coprocessor write instructions to cp15_8 are provided to manage the TLBs as 
illustrated in Table 8. It is again up to the MMU software to ensure that translation correctness is 
maintained when PTEs are modified. 


TLB Control Operations 

















Function Opcode_2 Rm Data 
Flush l+D 0b000 0b0111 Ignored 
Flush | 0b000 0b0101 Ignored 
Flush D 0b000 0b0110 Ignored 
Flush Dcache entry 0b001 0b0110 Virtual address 




















MMU Fault Handling 


ARM supports five types of exceptions as summarized in Table 1. 
¢ Two levels of interrupt (IRQ and FIQ) 
¢ Memory aborts 
¢ Undefined instruction 


¢ Software interrupts (SWIs used for OS syscalls) 


Memory abort is the exception mechanism applicable here. The abort mode may be entered from 
one of two exception vectors, depending on whether it was an instruction or data fetch that caused 
the fault. The MMU can generate a memory abort for four reasons. 


¢ An alignment fault on word or halfword loads or stores when two/one least significant address 
bits are nonzero, respectively. 


Alignment faults are enabled using bit 1 of cp15_1. 
¢ A translation fault when the PTE accessed is marked invalid. 
¢ A domain fault when access is disallowed by the domain protection in the current mode. 


¢ A permission fault when access is disallowed by the access permission (AP) bits in the current 
mode. 


An external abort pin can also be used to cause a data abort for instruction reads, data reads, PTE 
reads, unbuffered writes, or lock cycles on the system bus. 
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3.3.1 


3.3.2 


Prefetch Abort 


Prefetch aborts occur as a result of an external abort, translation fault, or protection violation. They 
are flagged at the fetch stage, but will only cause an exception if it is about to execute. It is up to the 
prefetch exception handler to recover the faulting address from the link register (R14[value-4] for 
the SA-110) and use this in conjunction with the relevant cp15 registers to determine the cause of 
the fault and how to recover. The fault status (cp15_5) or the fault address (cp15_6) are not updated 
on prefetch aborts. 


Once the fetch stage has seen an abort, no other instructions will be prefetched until the program 
counter (PC) has been changed by an exception, branch, or explicit write to R15. 


When returning from the prefetch abort fault handler, the following instruction will reload the PC 
and CPSR, then replay the previously faulting instruction: 


SUBS PC, R14_abt, #4 


Data Abort 


The cp15 fault status and address registers are used to determine the VA and cause of the abort. The 
registers can also be written by MCR instructions, which is useful for test and debug purposes. The 
saved PC for data aborts is the actual_PC+8. 


The SA-110 supports only a subset of the architected fault status encodings listed in Table 6. The 
terminal exception is not supported. 


When returning from the data abort fault handler, the following instruction will reload the PC and 
CPSR, then replay the previously faulting data access: 


SUBS PC, R14_abt, #8 
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MMU Initialization and Reset 


To enable the MMU, the following steps are necessary: 
¢ Initialize the domain and translation base address registers. 
¢ Initialize the level 1 and level 2 (where appropriate) page tables. 


¢ Enable the MMU (and optionally, alignment faults, WB, and I and D caches). 


To disable the MMU: 
¢ Clean the Dcache as appropriate. 
¢ Disable write buffering and the caches. 
¢ Disable the MMU. 


¢ Ensure that all cache entries are flushed where VA address conflicts may exist (valid entries 
will match as physical addresses when the MMU is disabled). 


When enabling or disabling the MMU, up to three instructions will be fetched and executed prior to 
the change taking effect. The actual number is dependent on whether the instructions hit in the Icache 
or not. Icache hits will propagate down the pipeline while the MCR instruction is executing, and will 
be drained through the execution path prior to the translation change taking effect. Icache misses will 
introduce pipeline bubbles that effectively introduce NOPs to the pipeline. This must be accounted 
for in any MMU management routines where the translated and untranslated addresses differ. It is 
normal practice to enable the MMU with a direct-mapped (VA = PA) translation, at least for the pages 
used within the MMU control code. If it is necessary to change these specific pages, program control 
should enable the MMU, switch to another area of memory, then modify these pages from there. 
Similar care is required when disabling the MMU, should that be necessary. 


If the page with the MMU_enable is changed as part of this step, the MMU enabling/disabling 
routine must ensure the cp15_1 write instruction is immediately followed by the ITB and Icache 
flushes, and that they all reside on the same cacheline. This is bad programming practice, which 
should be avoided for code building/crafting reasons. 


Application Note 14 


a 
Memory Management on the StrongARM SA-110 | ntel a 


4.1 


Note: 


A Worked Example 


A worked example is included as Appendix A. This code was written for the EBSA-110, a 
verification and example design available as an HDK (hardware design kit) through Intel’s sales 
channels. 


HDK order number: QR-21A81-11 


This reference manual and other Intel literature may be obtained by calling 1-800-332-2717 or by 
visiting Intel’s website for developers at:http://developer.intel.com. The HDK includes the relevant 
technical documentation, the hardware database (diskettes), and a firmware tree (diskette). 


The code illustrates the following: 
¢ A one-to-one mapping of VA and PA address spaces 
* 1 MB sections for 16 MB of DRAM (2 SIMM slots) 
¢ 64 KB pages for synchronous SRAM, ROM, and FLASH 
* Cachable, bufferable access to SRAM and DRAM 
¢ Read_only user access to ROM and FLASH (supervisor’s have write access, too) 
¢ Invalid accesses configured for nonsupported memory space 
¢ Simple noncachable, nonbufferable, aliased access to IO space 


The code assumes the MMU is disabled. It modifies the base address register before it generates 
the page table entries. 


The code was designed as part of a test harness for running demonstrations and software benchmarks 
from demon, the remote debugger supplied with ARM’s Software Development Tool kit (SDT) 
V2.0x. This required an extra level of link register preservation when entering supervisor mode from 
the demon environment, which should be self-explanatory from the comments in the code. 


The reset code used to flush the caches and TBs runs a read loop from an area of VA space reserved 
specifically for this purpose. The area is mapped to SSRAM because this provides the fastest 
access path. All other valid VA addresses are direct mapped to the same physical address. The 
original code (as shipped in the early versions of the HDK) read ROM. This is a very slow path that 
includes 8-bit to 32-bit packing. 


The flushes immediately following the disable command are only guaranteed because the VA and 
PA translations are the same (as described as described at the beginning of this section). 


The code is written in ARM assembler. 


Appendix B includes source code definitions plus macro calls for all coprocessor accesses. 
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Appendix A MMU Initialization and Reset 


3; Memory Management and Cache Initialization Routines for EBSA-110 


; History: 


7 VO.1 31-Jan-1996 DB first draft 
§ VOG2 11-Apr-1996 DB update include file references and add conditional assembly options 


7 NOW3 02-Aug-1996 DB cleanups and "fast path" clean loop example for appnote release 


; Routines which can be conditionally assembled to enable memory management 


; with the following cache options: 


; no caches enabled 

; no caches with write buffering 

, Icache only 

; Icache only with write buffering 
; Deache only with write buffering 
; I and Dcache with write buffering 


i PLEASE NOTE: 


; 1) Dcache with no write buffering is a nonsupported mode 
i, 2) Base register updated before tables assumes MMU is DISABLED 
i 3) MMU enabled with S and R bits in CP15_1 both cleared 


; (AP bits govern the access making S & R "don’t cares" 

; 4) Page tables require to be naturally aligned to their size 

; levell tables occupy 16KB 

; level2 tables occupy 1KB 

; this is stricter/more efficient than an alignment restriction to the page size 
; 

i 

i RO used as the default scratch data register for these routines 

; Rl used as the default scratch address register for these routines 
; 

i 

; Arguments: mmu_init (argl)- IC/DC/WB enables passed as a value 

; mmu_reset ()- nil 


INCLUDE address_map_h.s 


INCLUDE EBSA_110_defs_h.s 


EXPORT MMU_init 


EXPORT MMU_reset 


KEEP 
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AREA MMU_code, CODE, READONLY 


; write the translation base register 


; set the domain register: domainO CLIENT, domainl-15 NO_ACCESS 


7***PLEASE NOTE*** - reassigning the base register before the tables*** 


Gene are set up assumes the MMU is DISABLED *** 


7 set up the page tables - all for domain0O, flat address map 
7 16 x 1MB sections for DRAM - Read/write all 

; - cachable, bufferable 

: 2 x 64KB large pages for SSRAM - Read/Write all 


; - cachable, bufferable 

; 1MB section for FLASH - Read/Write spvr, Read-_only user 
; - cachable, bufferable 

; 8 x 64KB large pages for ROM - access as per FLASH 

; - cachable 

; *SPECIAL SECTION* reserved for clean loops 

; **x*dedicated VA space (1MB @ CLEAN_BASE) 

7 xx*xmapped to SSRAM for fastest access 

; ***kexception to flat map translation rule 

; no access for aliased DRAM, SSRAM, FLASH and ROM 

; IO map enabled noncachable, nonbufferable incl. aliases 


; Flush ITBs and DTBs 
; Flush the Icache 
; Enable MMU, alignment faulting, and conditionally the Icache, 


; ..-Dcache and Write Buffer 
777 next line required for standalone execution only 
77;7Comment out when used with.c files 
7; ENTRY 


GBLA CP15_1_MASK 


MMU_initMOV R2, LR 7 save Link Register prior to demon syscall 


7 needed to ensure the correct return address 


; ...in a demon environment 
MRS Rl, CPSR ; need to save the status too!!! 
SWI SWI_EnterOS ;Demon syscall to switch to spvr mode 
MOV LR, R2 ; reinstate Link Register for return 
MSR SPSR, R1 ; reinstate saved status to the *SPSR* for correct return 
STMDB sp!, {R4-R8} ; Save APCS register variables on the stack 


LDR Rl, =(IC_ON + DC_ON + WB_ON) 


AND RO, RO, R1 ;sanitize argument passed to only valid bits 
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STMDB sp!, {RO};then save the argument on the stack 


LDR 


WRCP15_TTBase RO 


RO, 


=Levelltab ;TTB address is 2**14 aligned 


;Initialize Translation Table Base reg. 


LDR RO, =1 

WRCP15_DAControl RO ;Initialize Domain Access Control 
, 
;First clear all TT entries - FAULT 


i 


LDR 


LDR 


LDR 


LDR 


Loop STR 


ADD 
CMP 


RO, 
R1, 
R2, 
R3, 


RO, 
R3, 


=0 7 loop count 
=Levelltab 
=0 


=L1_TABLE_ENTRIES + 2*L2_TABLE_ENTRIES 


R2, [R1], #4 
RO, #1 ; increment loop count 
RO 


BNE TTCLRLoop 


i 


;;Configure 


7 


i 


DRAM section accesses 


LDR R1, =Levelltab 

LDR R2, =DRAM_SIZE 

LDR R3, =0 ; loop count 

Loop LDR RO, =DRAM_BASE + DRAM _ACCESS 

ADD RO, RO, R3, LSL #20 ; add section number field 
STR RO, [Rl], #4 ; store TT entry 

ADD R3, R3, #1 ;increment loop count 

CMP R2, R3 

BNE DRAMLoop 


;;Configure 


7 


; 
LDR 
LDR 


STR 


i 


R1, 
RO, 


RO, 


7Configure 


LDR 
LDR 
STR 


R1, 
RO, 
RO, 


7Configure 


LDR 
LDR 


STR 


R1, 
RO, 
RO, 


CLEAN_LOOP special section access 


=Levelltab + CLEAN_BASE:SHR: (20-2) 
=SSRAM_BASE + FLASH_ACCESS ;7map to SSRAM with 
7; vead_only user space 


[R1] 


SSRAM section access 


=Levelltab + SSRAM_BASE:SHR: (20-2) 
=Level2tab_SSRAM + L2_CONTROL 
[R1] 


FLASH section access 

=Levelltab + FLASH_BASE:SHR: (20-2) 
=FLASH_BASE + FLASH_ACCESS 

[R1] 
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;;Configure ROM section access 
LDR Rl, =Levelltab + EPROM_BASE:SHR: (20-2) 
LDR RO, =Level2tab_ROM + L2_CONTROL 
STR RO, [R1] 
7;Configure IO section accesses to the end of memory 
LDR R1, =Levelltab + IO_BASE:SHR: (20-2) 
LDR R2, =L1_TABLE_ENTRIES/4 7 update top quartile of TT entries 
LDR R3, =0 ; loop count 


IO_LoopLDR RO, =IO_BASE + IO_ACCESS 


ADD RO, RO, R3, LSL #20 ; add section field 
STR RO, [Rl], #4 ; store TT entry 
ADD R3, R3, #1 ; increment loop count 


CMP R2, R3 
BNE IO_Loop 
7;Configure SSRAM large page accesses - 16 aliases per entry 
LDR R1, =Level2tab_SSRAM 
LDR R2, =SSRAM_PAGE_COUNT 
LDR R3, =16 
LDR R4, =0 ; loop countl (pages) 
SSRAMLoop2LDR R5, =0 ; loop count2 (aliases) 


LDR RO, =SSRAM_BASE + SSRAM_ACCESS 


ADD RO, RO, R4, LSL #16 ; add page field 
SSRAMLoop1STR RO, [R1], #4 ; store TT entry 
ADD R5, R5, #1 ; increment alias count 


CMP R3, RS 


BNE SSRAMLoop1 ; large page entry alias loop 


ADD R4, R4, #1; increment page count 

CMP R2, R4 

BNE SSRAMLoop2 ; page count loop 

;;Configure ROM large page accesses - 16 aliases per entry 

LDR R1, =Level2tab_ROM 

LDR R2, =EPROM_PAGE_COUNT 

LDR R3, =16 

LDR R4, =0 ; loop countl (pages) 
EPROMLoop2LDR R5, =0 ; loop count2 (aliases) 


LDR RO, =EPROM_BASE + EPROM_ACCESS 


ADD RO, RO, R4, LSL #16 ; add page field 
EPROMLoop1STR RO, [Rl], #4 ;store TT entry 
ADD R5, R5, #1 ;increment alias count 


CMP R3, RS 
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BNE EPROMLoop1 ;loop aliases 


ADD R4, R4, #1 ;increment page count 


CMP R2, R4 


BNE EPROMLoop2 ;loop for all pages 
WRCP15_FlushITB_DTB RO ;Flush ITBs + DTBs 
WRCP15_FlushIC RO ;Flush ICache 


i 


;Enable MMU, alignment faults and IC/DC/WB as required 


LDMIA sp!, {RO}; recover argument from the stack 
LDR R1, =EnableMMU ; NOTE: no alignment checks enabled in this 
. example 


ORR RO, RO, R1 


WRCP15_Control RO ; Update control register 
LDMIA sp!, {R4-R8};recover APCS register variables from the stack 
MOVS PC,LR ; return to user mode 


' 
;end of MMU config code 


i 


77 next line used when running this source file as standalone 


; SWI SWI_Exit; Halt execution - exit back to demon 


; Function used to reset the MMU after a benchmark 
; Flushes the Icache, cleans & flushes the Dcache, disabled IC, DC, WB and 


; MMU returning the memory system to its powerup state (flat map allows this) 


; Required by demon to allow multiple loads/execution of tests without resetting 


; the debugger. 


; As with MMU_init, this function requires privileged mode to execute the 


, necessary coprocessor accesses 


MMU_resetMOV R2, LR; save Link Register prior to demon syscall 


Application Note 


; needed to ensure the correct return address 
7; ...in a demon environment 


MRS Rl, CPSR; need to save the status too!!! 


SWI SWI_EnterOS;Demon syscall to switch to spvr mode 


MRS RO, CPSR 
ORR RO, RO, #0xCO 
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MSR CPSR, RO ; disable interrupts 
MOV LR,R2; reinstate Link Register for return 
MSR SPSR, R1 ; veinstate saved status to the *SPSR* for correct return 


; (this will inherently reinstate interrupts on return too) 


7; First clean the Dcache 


7; Use reads from ROM to evict any dirty entries 


LDR RO, =CLEAN_BASE ; address for dcache loads 
ADD R1, RO, #DCACHE_SIZE ; compare address for dcache clean 


; completion 


CLEANloopLDR R2, [RO], #DCACHE_LINE ; load a dcache line and increment address pointer 
TEQ R1, RO ; IF clean still in progress 
BNE CLEANloop ; THEN loop on dcache fills 
WRCP15_DrainWriteBuffer RO ; Drain the write buffer 


7; Reset MMU control register 


LDR RO, =0 


WRCP15_Control RO ; veset the control register in CP15 
7 
WRCP15_FlushITB_DTB RO ;Flush ITBs + DTBs 
WRCP15_FlushIC_DC RO ;Flush ICache + Dcache 
NOP 
NOP 
NOP 
NOP 7 make sure the pipeline clear of any cached entries 
MOVS PC,LR ; return to user mode 


it 


;;end of MMU config code 


LTORG 


AREA TTentries, DATA, NOINIT, ALIGN=14 


Levelltab % 1:SHL:14 ; 4-byte entries,1MB sections 
Level2tab_SSRAM % 1:SHL:10 ; 4-byte entries, 64KB pages; x16 alias 
Level2tab_ROM % 1:SHL:10 ; 4-byte entries, 64KB pages; x16 alias 
exit 

END 
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Appendix B Coprocessor Access Macros 


constants and system variables associated with SA-110 and the EBSA-110 platform 


7 


;Macros, 

;History: 
vo.1 02_Feb-1996 
v0.2 11-Apr-1996 
V0.3 12-Apr-1996 
vo.4 05-Aug-1996 


i 


7 


7 


7 


DB 

DB Add demon SWI call number definitions 

DB Merge definition and coprocessor macro files 
DB Add CLEAN_BASE for new clean loop code 


EBSA-110 platform data 


DRAM_SIZE assumes 2 x 8MB SIMMs fitted 


DRAM_SIZE EQU 
SSRAM_PAGE_COUNT EQU 
EPROM_PAGE_COUNT EQU 
DCACHE_SIZE EQU 
DCACHE_LINE EQU 
L2_CONTROL EQU 
L1_TABLE_ENTRIES EQU 
L2_TABLE_ENTRIES EQU 
IO_BASE EQU 
CLEAN_BASE EQU 
DRAM_ACCESS EQU 
SSRAM_ACCESS EQU 
FLASH_ACCESS EQU 
EPROM_ACCESS EQU 
IO_ACCESS EQU 


i 
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0x4000 


0x20 


Oxl 


0x1000 


0x100 


0xC0000000 


Ox3FF00000 


OxCOE 


Ox0OFFD 


Ox80A 


Ox0AA9 


OxC02 


16MB in 2 x SIMMs 


2 x 64k 128KB 


8 x 64k 512KB 


16KB Dcache 


32B cache line entry 


domain0O, page table pointer 


16KB table (word entries 


1KB table (word entries) 


top quartile of address space 


reserve VA of last section in bottom quartile 


AP=11, domain0O, C=1, B=1 
AP=11, domain0O, C=1, B=1 
AP=10, domain0O, C=1, B=0 
AP=10, domain0O, C=1, B=0 


AP=11, domain0O, C=0, B=0 


Definitions used in conditional assembly of Icache, Dcache and Write Buffer 


7; options 
IC_ON EQU 
IC_OFF EQU 
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0x1000 


0x0 
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DC_ON EQU 0x4 
DC_OFF EQU 0x0 
WB_ON EQU 0x8 
WB_OFF EQU 0x0 


JOR I ORC I ROR IO I I OR I I OOK I OO I IO I I OK 


7* Duplicate of information in demon subdirectory’s levell_h.s* 


ae (too many redundant dependencies for reuse directly) * 
ee * 
i* SWI numbers as used in demon 


RIOR I ROKR IO I IR I IKK I OOK IO I I 


SWI_WriteC EQU &0 
SWI_Write0 EQU &2 
SWI_Readc EQU &4 
SWI_CLI EQU &5 
SWI_GetEnv EQU &10 
SWI_Exit EQU &11 
SWI_EnterOS EQU &16 
SWI_GetErrno EQU &60 
SWI_Clock EQU &61 
SWI_Time EQU &63 
SWI_Remove EQU &64 
SWI_Rename EQU &65 
SWI_Open EQU &66 
SWI_Close EQU &68 
SWI_Write EQU &69 
SWI_Read EQU &6a 
SWI_Seek EQU &6b 
SWI_Flen EQU &6C 
SWI_IsTTY EQU &6e 
SWI_TmpNam EQU &6£ 
SWI_InstallHandler EQU &70 
SWI_GenerateError EQU &71 


;Definitions and Macros for SA-110 Coprocessor Access 
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; SA-110 *only* supports Coprocessor number 15 
; Only MCR and MRC coprocessor instructions are supported - the others 


; ..-generate an UNDEFINED exception 


; CP15 registers are architected as per the ARM V4 architecture spec 


; Register0O ID register READ_ONLY 
; Registerl Control READ_WRITE 
; Register2 Translation Table Base READ_WRITE 
, Register3 Domain Access Control READ_WRITE 
; Register4 Reserved 

; Register5 Fault Status READ_WRITE 
7 Register6 Fault Address READ_WRITE 
; Register7 Cache Operations WRITE_ONLY 
; Register8 TLB Operations WRITE_ONLY 
; Register9-14 Reserved 

; Register15 SA-110 specific tst/clk/idle WRITE_ONLY 


77 Bit definitions for the control register: 


7; enables are logically OR’d with the control register 


7; use bit clears (BICs) to disable functions 


7G *** all bits cleared on RESET *** 

EnableMMU EQU Oxl 

EnableAlignFault EQU 0x2 

EnableDcache EQU 0x4 

EnablewB EQU 0x8 

EnableBigEndian EQU 0x80 

EnableMMU_S EQU 0x100 ; selects MMU access checks 
EnableMMU_R EQU 0x200 ; selects MMU access checks 
EnableIcache EQU 0x1000 


;; Defined Macros: 


it 


;RDCP15_ID Rx read of ID register 
;RDCP15_Control Rx read of Control register 
;WRCP15_Control Rx write of Control register 
;RDCP15_TTBase Rx read of Translation Table Base reg. 
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;WRCP15_TTBase Rx 
;RDCP15_DAControl Rx 
;WRCP15_DAControl Rx 
;RDCP15_FaultStatus Rx 
;WRCP15_FaultStatus Rx 
;RDCP15_FaultAddress Rx 
;WRCP15_FaultAddress Rx 
;WRCP15_FlushIC_DC Rx 
7 
;WRCP15_FlushIc Rx 
7 
;WRCP15_FlushDC Rx 
7 
;WRCP15_CacheFlushDentry Rx 
;WRCP15_CleanDCentry Rx 
;WRCP15_Clean_FlushDCentry Rx 
;WRCP15_DrainWriteBuffer Rx 
7 
;WRCP15_FlushITB_DTB Rx 
7 
;WRCP15_FlushITB Rx 
7 
;WRCP15_FlushDTB Rx 
it 
7WRCP15_FlushDTBentry Rx 
;WRCP15_EnableClocksw Rx 
i 
;WRCP15_DisableClocksSw Rx 
7 
;WRCP15_DisablenMCLK Rx 
7 
;WRCP15_WaitInt Rx 
7 
;Coprocessor read of ID register 
i 

MACRO 

RDCP15_ID $reg_number 

MRC p15, 0, $reg_number, c0, cO ,0 

MEND 
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write of Translation Table Base reg. 
read of Domain Access Control reg. 
write of Domain Access Control reg. 
read of Fault Status register 

write of Fault Status register 

read Fault Address register 


write of Fault Address register 





cache control - Flush ICache + DCache 
Rx redundant but rqud for MACRO 

cache control - Flush ICache 

Rx redundant but rqud for MACRO 

cache control - Flush DCache 

Rx redundant but rqud for MACRO 

cache contro - Flush DCache entry, 
Rx source for VA 

cache contro - Clean DCache entry, 
Rx source for VA 

cache contro - Clean + Flush DCache entry, 








Rx source for VA 
Drain Write Buffer 

Rx redundant but rqud for MACRO 
TLB control - Flush ITBs + DTBs 
Rx redundant but rqud for MACRO 

TLB control - Flush ITBs 

Rx redundant but rqud for MACRO 
TLB control - Flush DTBs 

Rx redundant but rqud for MACRO 
TLB control 


- Flush DTB entry, Rx source for VA 


test/clock/idle control - Enable Clock Switching 
Rx redundant but rqud for MACRO 
test/clock/idle control - Disable Clock Switching 
Rx redundant but rqud for MACRO 
test/clock/idle control - Disable nMCLK output 
Rx redundant but rqud for MACRO 
test/clock/idle control - Wait for Interrupt 


Rx redundant but rqud for MACRO 
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;Coprocessor read of Control register 
MACRO 
RDCP15_Control $reg_number 
MRC p15, 0, $reg_number, cl, cO ,0 
MEND 

;Coprocessor write of Control register 
MACRO 
WRCP15_Control $reg_number 
MCR p15, 0, $reg_number, cl, cO ,0 
MEND 

;Coprocessor read of Translation Table Base reg. 
MACRO 
RDCP15_TTBase $reg_number 


MRC p15, 0, $reg_number, c2, cO ,0 


MEND 


;Coprocessor write of Translation Table Base reg. 

MACRO 
WRCP15_TTBase S$reg_number 
MCR p15, 0, $reg_number , c2, cO ,0 
MEND 

;Coprocessor read of Domain Access Control reg. 
MACRO 
RDCP15_DAControl S$reg_number 
MRC p15, 0, $reg_number, c3, cO ,0 
MEND 


;Coprocessor write of Domain Access Control reg. 


MACRO 
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WRCP15_DAControl $reg_number 
MCR p15, 0, Sreg_number, c3, cO ,0 
MEND 

;Coprocessor read of Fault Status register 
MACRO 
RDCP15_FaultStatus $reg_number 
MRC p15, 0, Sreg_number, c5, cO ,0 
MEND 

;Coprocessor write of Fault Status register 
MACRO 
WRCP15_FaultStatus $reg_number 
MCR p15, 0, Sreg_number, c5, cO ,0 
MEND 

;Coprocessor read of Fault Address register 
MACRO 
RDCP15_FaultAddress S$reg_number 
MRC p15, 0, Sreg_number, c6, cO ,0 
MEND 

;Coprocessor write of Fault Address register 
MACRO 
WRCP15_FaultAddress S$reg_number 
MCR p15, 0, Sreg_number, cé6, cO ,0 


MEND 


;Coprocessor cache control 


;Flush ICache + DCache 
MACRO 


WRCP15_FlushIC_DC $reg_number 


MCR p15, 0, Sreg_number, c7, c7 ,0 


24 


Application Note 


MEND 


;Coprocessor cache control 


;Flush ICache 
MACRO 
WRCP15_FlushIC $reg_number 
MCR p15, 0, $reg_number, c7, c5 ,0 


MEND 


;Coprocessor cache control 


;Flush DCache 
MACRO 
WRCP15_FlushDC $reg_number 
MCR p15, 0, $reg_number, c7, cé ,0 


MEND 


;Coprocessor cache control 


7Flush DCache entry 
MACRO 
WRCP15_CacheFlushDentry $reg_number 
MCR p15, 0, $reg_number, c7, cé6 ,1 


MEND 


;Coprocessor cache control 


7Clean DCache entry 
MACRO 
WRCP15_CleanDCentry $reg_number 
MCR p15, 0, $reg_number, c7, cl10 ,1 


MEND 


;Coprocessor cache control 


7Clean + Flush DCache entry 


MACRO 
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WRCP15_Clean_FlushDCentry $reg_number 
MCR p15, 0, $reg_number, c7, cl14 ,1 
MEND 

;Coprocessor Drain Write Buffer 
MACRO 
WRCP15_DrainWriteBuffer $reg_number 
MCR p15, 0, S$reg_number, c7, cl10 ,4 
MEND 

;Coprocessor TLB control 

;Flush ITB + DTB 
MACRO 
WRCP15_FlushITB_DTB S$reg_number 
MCR p15, 0, Sreg_number, c8, c7 ,0 


MEND 


;Coprocessor TLB control 


;Flush ITB 
MACRO 
WRCP15_FlushITB $reg_number 
MCR p15, 0, Sreg_number, c8, c5 ,0 


MEND 


;Coprocessor TLB control 


;Flush DTB 
MACRO 
WRCP15_FlushDTB S$reg_number 
MCR p15, 0, Sreg_number, c8, c6 ,0 


MEND 


;Coprocessor TLB control 


7Flush DTB entry 


26 


Application Note 


MACRO 
WRCP15_FlushDTBentry $reg_number 


MCR p15, 0, $reg_number, c8, c6é ,1 


MEND 


;Coprocessor test/clock/idle control 


7Enable Clock Switching 


MACRO 
WRCP15_EnableClockSW S$reg_number 


MCR p15, 0, $reg_number, cl15, cl ,2 


MEND 


;Coprocessor test/clock/idle control 


7Disable Clock Switching 


MACRO 
WRCP15_DisableClockSW $reg_number 


MCR p15, 0, $reg_number, cl15, c2 ,2 


MEND 


;Coprocessor test/clock/idle control 


;Disable nMCLK output 


MACRO 
WRCP15_DisablenMCLK S$reg_number 


MCR p15, 0, $reg_number, cl15, c4 ,2 


MEND 


;Coprocessor test/clock/idle control 


;Wait for Interrupt 


Application Note 


MACRO 
WRCP15_WaitInt $reg_number 


MCR p15, 0, $reg_number, c15, c8 ,2 


MEND 


END 
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